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ABSTRACT 

The widely used nuclear norm heuristic for rank minimiza¬ 
tion problems introduces a regularization parameter which is 
difficult to tune. We have recently proposed a method to ap¬ 
proximate the regularization path, i.e., the optimal solution as 
a function of the parameter, which requires solving the prob¬ 
lem only for a sparse set of points. In this paper, we extend 
the algorithm to provide error bounds for the singular values 
of the approximation. We exemplify the algorithms on large 
scale benchmark examples in model order reduction. Here, 
the order of a dynamical system is reduced by means of con¬ 
strained minimization of the nuclear norm of a Hankel matrix. 

Index Terms — Nuclear norm heuristic, regularization 
path, singular value perturbation, model order reduction. 

1. INTRODUCTION 

Rank minimization has important applications in e.g. signal 
processing, control, machine learning, system identification, 
and model order reduction. The matrix argument can e.g. be 
a covariance matrix (as in sensor array processing and mul¬ 
tivariate statistical data analysis) or a structured matrix such 
as a Hankel matrix (as in system realization), [?]. Specif¬ 
ically, application areas include spectrum sensing [?], signal 
time delay estimation [?], phase retrieval of sparse signals [?], 
wireless network inference [?], channel equalization [?], etc. 

In general, the rank minimization problem is non-convex 
and NP-hard [?]. However, a common convex heuristic for 
these problems is nuclear norm minimization. The nuclear 
norm H-H^ = ™rn of the singular values, is 

used as a convex surrogate for the non-convex rank function; 
this is so because the nuclear norm can be interpreted as a 
convex relaxation of the rank, since it is the pointwise tightest 
convex function (called a convex envelope [?]) to lower-bound 
the rank, for matrices inside a unit spectral-norm ball. 

Consider a general case of minimization of the nuclear 
norm of a linear map subject to a quadratic constraint; 

This work was supported by the European Research Council under the 
advanced grant LEARN, contract 267381, and by the Swedish Research 
Council under contract 621-2009-4017. 


minimize ||A(x)||^ 
subject to ||x — Xolla < A, 


( 1 ) 


where A : R" —>■ is a linear map (for simplicity, from 

now on we treat the symmetric case, p = g), a; S R" is the 
decision variable, and A is the regularization parameter. 

Note that the formulation in Q belongs to a subclass of 
regularized nuclear norm optimization problems. Other for¬ 
mulations include exchanging cost and constraint or the pe¬ 
nalized version [?]. In addition, our theory can readily be 
extended to weighted norms, ||a;||T 4 ? := x^Wx. Then, the 
quadratic constraint is equivalent to the general quadratic in¬ 
equality x'^Px + q^x -f r < 0. 

The key issue here is that, although regularized nuclear 
norm minimization has been thoroughly studied, it suffers 
from the fact that the dependence of the solution on the reg¬ 
ularization parameter is difficult to predict. Without, in gen¬ 
eral, a priori knowledge on how to choose A, we are moti¬ 
vated to study the so called regularization path, i.e., the opti¬ 
mal solution as a function of the regularization parameter. For 
problem Q the regularization path is dehned on the domain 


A G (Amin, Amax) := (0, ||a;o|| 2 ), (2) 

since for A = 0 the solution to 0 is known, = Xo, and 
for A > ||a;o ||2 the constraint set is large enough to include 
the unconstrained minimum, = 0. 

For practical purposes the domain of the regularization 
path must be discretized, which raises the question of how to 
choose the grid points. This is indeed an important question 
since problem 0 can be computationally costly to solve. 

To address this problem, in [?], we presented a method 
to choose the grid points based on a worst-case approxima¬ 
tion error when the optimal solution for A, is approxi¬ 
mated by x°^^ for A* < A. The idea is visualized in Figure 
1. Given the solution for some A*, we increase A beyond 
A* until the worst-case approximation eri'or reaches a pre¬ 
specified tolerance, e, and then we re-evaluate 0 - Iteratively, 
starting for AJ = 0, this procedure generates a set of grid 
points, X*,i = 1,... ,m, and an approximate regularization 
path such that the approximation error is within e for all A. 


\\A(x 



X 


Fig. 1. Illustration of regularization path algorithm proposed in [?]. x-axis: regularization parameter, A. y-axis: cost of Q. The 
true regularization path (red) is guaranteed to lie in the shaded zone. The approximate path (blue) is guaranteed to differ by at 
most e from the true path. 


The novelty of this paper consists of two new algorithms. 
The first gives a guarantee on the cost function of Q. The 
second gives a guarantee on the singular values of 
when is approximated by x^’’*. Furthermore, we derive 
upper bounds on the number of grid points needed by the al¬ 
gorithms to meet a tolerance e. 


Now, assume that x^’’* solves (j^ for some parameter value 
A = A*. Then, since the nuclear norm is convex we can write, 
for any matrix A{x) G the inequality 

M ( x)iL > \\a{x°;^!)\1 + {u^.v^. + w,a{x)-a{x°;^!)) 

= ||Al (x””*') 11^ + A*{Ux*V^. + Wf (x - xlK) , 


2. ERROR BOUNDS FOR APPROXIMATION OF (I) 

In this section we derive error bounds that allow us to confine 
the true regularization path within a certain region (the shaded 
area in Figure 1). 

Define the singular values of Al(x^’’'), where x^*" is opti¬ 
mal for Q for parameter value A, as 

=■ ■ 

For further use in the below presented Algorithms 1 and 2, 
respectively, we derive upper bounds on the quantities: 

11^ «”*) L -11“^ i^T) L H > (3,4) 

i 

where x^’’* is given. The bounds can be viewed as worst-case 
approximation errors in the singular values when x^’’* is ap¬ 
proximated by x^’’*. 

2.1. Relaxation of (1) using subgradients 

We here relax problem Q using subgradients of the nuclear 
norm. The concept of subdifferentials (or sets of subgradi¬ 
ents) is a generalization of the gradient that applies to func¬ 
tions whose gradient is undefined in some point or points, [?]. 
In the case of the nuclear norm, the subdifferential is (see e.g. 
[?]): 

d ||A||^ = {UV^ + W : U^W = W = 0, ||IU|| < l} , 

where X = UXV'^ £ ippxp is a compact singular value de¬ 
composition W G U'^W = W'^V = 0 implies that X 

and W must have orthogonal row and column spaces. 


where(7A*TA^^+IF G ^ II^ILIx=yt(a;“^‘’l)’ ("4,-B) = Tr 

is the standard inner product, and A* is the adjoint operator 
of A. For shorter notation we define 

A*iUx.V^. + Wf =: ax^iWf. (5) 


To sum up, the above inequality becomes 
M(x)||^ > ||A(x 7.0L+aA*(f^)^ , (6) 


which implies that for A > A* the optimal argument x^*’' must 
lie in the half-space {x : a\*{W f (x — x^’’*) < O}. 

Using the inequality in (|^ we can relax Q into 

min IIA (x^*) ||,, + axfWf (x - x^.^) 

(7) 

s.t. ||x — X 0 II 2 < A. 


Problem Q is solved analytically in the following lemma: 
Lemma 2.1. Problem Q has the optimal solution 


opt, rlx 


= Xo 


A 

hxfw)f 


ax*{W), 


and optimal cost 

||AlK'’*')L +«A*(fU)^ (x„ - xf) - A ||aA*(IU)|| 2 . (8) 


Proof. At optimum the constraint is tight and the negative 
gradient of the cost function, —ax*{W), is proportional to 
the outward pointing normal of the constraint set. This gives 


^opt,rlx jjjjQ jjjg Qf Q gives 


□ 


2.2. Bound on cost function approximation error, ([^ 

Using we can upper bound the approximation error in Q 










Theorem 2.1. The approximation error in Q (i.e., the cost 
function approximation error) for any A is upper-bounded by 
the function d\* (A, W), as 


|MK':')IL - WMxtn. < 

(xo-xT) =■■ i'a-IA.H'). 


Proof. The theorem follows from the fact that, for any A, 
11 ^ is lower bounded by the optimal cost in dSll. □ 


Remark 2.1. In Section we present a Frank-Wolfe algo¬ 
rithm for optimizing 0 over W. Furthermore, it can be ver¬ 
ified that there is some W°‘’‘ such timt d\* (A*, W^^') = 0, by 
taking W°p' = W'^ according to {13\. 


Remark 2.2. In resemblance with /? ], the function d\* (A, W) 
can be interpreted as a duality gap, since the relaxation made 
in 0 relates to the Frank-Wolfe algorithm [I] when seen as 
a primal-dual method. 


Lemma 2.3. There exists aW — W-^ such that ax* 

(see ([^j is proportional to the error vector (xo — x’^r), i.e., 

ax* {W^) = 7 (a;o - x°^l) , (13) 

for some scalar 7 > 0 . 

Proof. The proof is in the Appendix. □ 


Lemma 2.4. 


I opt opt 11 ^ 

Fa* ~ ^A II2 


< A" - (A*) . 


(14) 


Proof. Due to the existence of ax* in • 13 i, is con¬ 

strained by the convex set 


Ci(X) :={ 


= ix : \\x - Xo\\2 < 


X,ax* {x - x°y) < o|, 


2.3. Bound on singular value approximation error, Q 

Next, we derive an upper bound on the error in Q. This 
bound will be the minimum of two separate bounds. The first 
of these is as follows: 


Lemma 2.2. 



I 


< 



where is the i’th unit vector, i.e., has zeros every¬ 
where except at the i ’th component which is one, and i"”" = 
arg min , z = 1 ,..., p. 

Proof. For A > A*, ||aI ||^ < ||aI ||^. Hence, 

P t 2 

corresponds to the maximum of ^ (cr^ — cr^) subject 

i 

to < ||AI(x 7 *)||^ , which is reached by making o-jmm as 

i 

large as possible and tJi = 0 for z F t™"- Q 


(10 


so an upper bound is max llxF* — xll^. This maximum 

can be solved geometrically. Since x^’’* is inside the ball of 
the first constraint of C 3 J“*(A), this constraint has to be tight 
at the optima. Furthermore, with the first constraint being 
tight, the vectors (x — Xo), (x^T — Xo), and (x^l' — x) form 
a triangle, with ||x — X 0 II 2 = A and ||x^’’.^ — X 0 II 2 = so 

A^ = ||x^'’* ~ ^I|2 3" ~ 2 II^A* ~ 

according to the law of cosines, where z; > 0 is the angle 
between x^’’* — x and the hyperplane 

|x : ax* {W^)^ (x - x^*) = o|. 

This expression is maximized for x = 0 giving the result 
||x^'’* — x°P ^||2 = A^ — (A*)^. (In fact, z; = 0 implies that 
the second constraint in (A) is also tight.) □ 


Now, we derive a second upper bound, which is comple¬ 
mentary to the above. To do this, consider the perturbation 

A {xyy = A {xf) +E- E:= A {x°y - xT) , (11) 

which is valid since A is linear in x. Then, according to 

pOP' 

'-A 
P 

'/..A* 


Mirsky’s theorem [?] the singular values of A{x°y) obey 




where, due to equivalence of finite-dimensional norms [?], 


\\Efp = ||Al(x°'’* - x7')||^ < Ca ||<'’» - X 


opt I 


( 12 ) 


for some constant Ca depending on A. 

Furthermore, we bound 11 x°y — x^^* 11 ^ 
low. For this we need the following lemma: 


in Lemma 


2.4 


be- 


Combining (10 1 , (12 1 , and (14 1 , we obtain the following 
upper bound on the approximation error in (|^: 

Theorem 2.2. The approximation error in ^ is upper 
bounded by the function sx* (A).' 


P / 

i:( 


ptA* ppA 

cr, — (J, 


< 


< min 
=: sa*(A). 


cr^* - ||aI (x°':') 


,CA{X^-{X*f) 


(15) 


Proof. The first argument in the min is given by (lOi. The 
second is obtained by combining (12 1 and ([l4|. □ 




















3. ALGORITHMS 


3.1. Model order reduction 


In model order reduction, and approximative filter design, the 
aim is to reduce a high-order model description to a low-order 
model while preserving the properties according to some fit 
criterion. 

We consider a known Finite Impulse Response (FIR) 
model of a stable scalar discrete-time linear time-invariant 
dynamical system, denoted by go € R", which is a vector 
containing its impulse response coefficients. Furthermore, 
we denote the low-order candidates by g, and consider the 
H 2 model fit criterion \\g — 50 II 2 ^ Note that other criteria 
commonly used in model order reduction are the Hoq- and 
Hankel norm-criteria (see [?] or [?]), which are not consid¬ 
ered here. 

It can be shown [?] that the following Hankel matrix (here 
taken to be symmetric for simplicity) 


nig) 


91 92 

92 93 


9p 

9p+i 


9p 9p+i ■ ■ ■ 9n 


(16) 


has the property that its rank is equal to the order (McMillan 
degree) of the dynamical system which has g as impulse re¬ 
sponse. This motivates the Hankel matrix rank minimization 
problem to enforce a low system order. 

Using the nuclear norm as surrogate for rank and the H 2 
model fit criterion, we formulate the following special case of 

Q: 


minimize 11^(3)11,, 

9 

subject to ||p-po|l 2 <A- 


(17) 


Note that in this setting a ('H(p)) are the Hankel singular val¬ 
ues of the system with g as impulse response. 

The adjoint of the Hankel operator in ( [T^ , %* (A), maps 
matrices X S Rp^p to vectors x G R", by summing the anti¬ 
diagonals of X, i.e.. 


n*(X)=x; Xk= Y. 


3 . 2 . The algorithms 

The algorithms are outlined in Algorithm 1 and 2. The idea is 
to adaptively choose a set of discretization points, for which 
problem Q is solved. In the intermediate intervals the regu¬ 
larization path is approximated by the previous solution (ob¬ 
tained on the infimum of the current interval). The resulting 
approximation errors are upper bounded in (j^ for Algorithm 
1 and ( [T5| ) for Algorithm 2. The discretization points are cho¬ 
sen as the values of A for which the upper bound reaches a 
pre-specified error tolerance, e. This is visualised in Figure 2. 


Note that in Algorithm 1, dx* (A, W) depends on W. For 
simplicity, we can set W = 0, but in Section we also 
demonstrate how to optimize dx* (A, W) over W. Also note 
that for a Hankel matrix the quantity = n satisfies 0. 

Algorithms 1 and 2. Approximate regularization paths. 

Input: go,£- 

Output: Approximate regularization paths such that er¬ 
rors (|^ < e (Alg. I)or0 < £ (Alg. 2) for A = [0, Amax]- 
Initialize i = 0. Set AJ = 0. 
while A* < Amax do 

Solve for A = A*, giving —>■ = 

^iAxT))- 

Solve A*^i from dx* (A*+i, W = 0) = £ (Algorithm 1) 
or sx* = £ (Algorithm 2). 

Accept x‘^1 as approximate solution for A = [A*, A*_|_]^). 
Set z = f 4-1. 

end while 


3.2. J. Number of evaluations for Algorithm 1 

Here we bound the number of evaluations of ([T]|, i.e., the num¬ 
ber of iterations of the above algorithm needed to guarantee 
the error ([^ within the tolerance e. 

Theorem 3.1. The number of evaluations 0/0 needed by 
Algorithm 1 is at most 

M^igi < 

in general, and ifW = 

^w=o ^ 
iVlAlgl s ^ 

where £ is the tolerance and 

p-i \ 5 

k=l / 

in which n = 2p — 1, Ipxp is a {p x p)-matrix of ones, and 
the adjoint of the Hankel operator is defined in ( [7^| ). 

Proof. The proof is in the Appendix. □ 



2Cn llffol 


0 .- 




n \\hio \\2 


= Oi£-^), 




(19) 




3.2.2. Number of evaluations for Algorithm 2 


Now, we bound the number of evaluations of 0, i.e., the 
number of iterations of Algorithm 2 needed to guarantee the 
solution within the tolerance e. 


Theorem 3.2. The number of evaluations 0/0 needed by 
Algorithm 2, i.e., the number of iterations, is at most 


MAlgl < 


n\\9o\\l 

e 


O(e-i). 


( 21 ) 


Proof. The proof is in the Appendix. 


□ 















benchmark 

order 

Tg 

n 

cpu ADMM 

g.min 1 j max 

Algorithm 1 

e/J”“ TO 

Algorithm 2 

M TO 







0.2 

5 

3062 

30 

10 

beam.mat 

348 

1 

1047 

134.23 

0.1233 












0.3 

3 

4082 

20 

5 







0.2 

7 

1035 

30 

10 

build.mat 

48 

0.025 

576 

317.86 

0.1607 












0.3 

4 

1379 

20 

5 







0.2 

5 

643 

30 

9 

eady.mat 

598 

0.1 

196 

7.36 

0.0958 

0.3 

3 

856 

20 

4 







0.2 

5 

482 

30 

12 

heat — cent .mat 

200 

0.5 

139 

11.78 

0.7270 












0.3 

3 

642 

20 

7 







0.2 

5 

1193 

30 

7 

pde.mat 

84 

0.0001 

242 

39.80 

0.1054 

0.3 

3 

795 

20 

3 


Table 1. Results of Algorithm 1 and 2. Tg is sampling time in Matlab’s c2d, giving impulse response lengths n. ’cpu ADMM’ 
is an average time in seconds with a standard laptop for solving (17 1 using ADMM. The maximum cost := \[H{go)\\^- m 
is number of grid points needed, with upper bounds M (for Algorithm 1 we use (20 1 ). e™" is the minimum tolerance for which 
dx* (A*, W = 0) < e™” for all A*. For Algorithm 2, e = n ||po|l 2 /-^Aig 2 - 


4. IMPLEMENTATION 


where ai,Ui, Vi are given by the singular value decomposition 


For large scale problems ([T]) we suggest an Alternating Direc¬ 
tion Method of Multipliers (ADMM), c.f. [?] and [?]. We will 
follow the method in [?] with a modihcation for the p-update 
in ( |24l l below. 

First, we rewrite as 

minimize IliFlL 

gGlR",rrGKPX!> 

subject to \\g - go \\2 < A *^22) 

nig) = H. 

Next, we form the following augmented Lagrangian 


LpiH,g,Z) = 

||i7|L + Tr {Z^inig) - H)) + ^ ||7f ( 5 ) - H\\l. 


nig^) + -z'^ 

P 


^ OiU.vJ. 


The second subproblem, (24 1 , we reformulate as 


minimize i)x'^Px + q^x 

X ^ 

subject to Ikll2 ^ 


where x = g — go, P = diag('H*(lpxp)) and q = n*{Z^ + 
pHigo) — and %*{■) is dehned in (18 1 . This can be 

solved by using the facts that the optimal point, lies on 
the boundary of the constraint set, and in this point the neg¬ 
ative gradient of the cost function is normal to the constraint 
set, i.e., it is proportional to This means that 


The strategy is to update the variables as 

■= arg min Lp(i7, g’^, Z’^) (23) 

H 

/■+!:= arg min LpiH^+\g,Z^) (24) 

{3:||9-9o|l2<'>'} 

■-Z^ + pinig^^^) - (25) 

The variables can be initialized e.g. as H = 0, g = 0, Z = 
0,p = 1. (Initialize p if it is adaptive as in [?]). 

The H update in ( [23| is accomplished in [?] using so 
called ’singular value soft-thresholding’: 


= arg min Lp{H, Z'^) 

H 

= arg min (||i7||, + f ||ff - - illp)Z%) 

= y^max|o, cr^ - UivJ, 


pPa;°P' + q = a;°P‘ = -{pP P tI)-\ 

where t > 0 is a scalar determined from solving f{t) := 
||a;°P '^||2 = A using Newton’s method. This t is unique since 

( n 2 

is a decreasing function with /(O) > A and /(oo) = 0. The 
fact that /(O) > A is true since a;'’P'^(t = 0) is the global min¬ 
imum, which is located outside the constraint set. Summing 
up, we obtain 



g'‘+^ =go-(pP + tI)-^q- (26) 

The stopping criterion is Hrp+^H < and < 

where the primal and dual residuals (rp and ?’jj) and tol¬ 
erances (cp and Cd) are computed from the definition in [?, 














Sec. 3] as 


r^+i :=-H(/+^) 

:= pn*{H^ - h'^+^) 

ep+^ :=peabs + erei max {||'H( 5 ''+^) ||^ , ||77'=+^ ||^} 
4 +^ :=yneabs + erei||H*(Z'=+i)||2. 

4.1. Frank-Wolfe algorithm for optimizing ([^ over W 

The Frank-Wolfe algorithm (or conditional gradient method) 
is a simple iterative method, suggested in [?] (1956) for min¬ 
imizing convex, continuously differentiable functions / over 
compact convex sets. We here design a Frank-Wolfe algo¬ 
rithm for optimizing (j9]) over W. Our algorithm is summa¬ 
rized in Algorithm 3. 

To solve the argument minimizations at each iteration ex¬ 
plicitly, we note that for the constraints in Ml 

u'^w = 0^W = U-^A, 

and 

WV = U-^AV = 0<^AT/ = 0<^A = D{V4'^, 
for some matrices A and D of appropriate size. Hence, 

W = U^D{V^)'^, 



X 10^ 



Fig. 2. True errors (red) and confidence zones (grey) for 
the model build.mat. Upper: Algorithm 1 with approx¬ 
imate regularization path (blue) for e/J™“ = 0.3, where 
jmax _ Lower: Algorithm 2 for e = n\\go\\l/M, 

where M = 30. 


where ||kF|| < 1 => ||i9|| < 1. Then, in Algorithm 3, 
we parameterize X = U-^D{V-^Y'^ so that {X,C) = 
{U^D{V^)'^,C) = {D,{U4^CV4 =: {D,C), and 
solve 

arg min(i9, (7), 

This problem has the closed form solution i9°P* = 
where C = is a compact singular value decompo¬ 

sition. Then, 



A:°p‘ = (7-^£)°P‘(U-^)^ = . 

Finally, when optimizing the duality gap over W for a 
fixed A, we have 


C = 


ddx*{\,W) 


dW 


W=W'‘ 


-n {hx* )+'H - go) 


Algorithm 3. Frank-Wolfe for optimizing (|^ over W 
Initialize = 0 
fork = 0,l,...,Kdo 


Compute Ar°P' := 

Update := 

end for 


{x,cy,c= 

X^M 

(1 - 7)IU'= -f 7 -A°p‘, for 7 = ^ 


W=W>’ 

2 

2+k 



Fig. 3. Plot of significant singular values ai,i = 1,..., 17, 
for beam.mat. Vertical lines indicate grid points. Upper: Al- 
gortihm 1 (e/J™“ = 0.2, where = |i'H((?o)|i*)- Lower: 
Algorithm 2 (e = n\\go \\2 /M, where M = 40). 























































Fig. 4. Plot of significant singular values ai,i = 1,..., 17, 
for becun.mat. Vertical lines indicate grid points. Upper; Al- 
gortihm 1 (e/J™“ = 0.3, where J™“ = 1177(50)11*)- Lower: 
Algorithm 2 (e = n II 50 II 2 /M, where M = 30). 


Fig. 6. Plot of significant singular values (7^,7 = 1,..., 26, 
for build.mat. Vertical lines indicate grid points. Upper; Al- 
gortihm 1 (e/J™“ = 0.3, where = 1177(50)11*). Lower: 
Algorithm 2 (e = n II 50 II 2 /-L7, where M = 30). 




Fig. 5. Plot of significant singular values ai,i = 1, ..., 26, 
for build.mat. Vertical lines indicate grid points. Upper: Al- 
gortihm 1 (e/J™“ = 0.2, where J™™ = ||77(5o)||*). Lower; 
Algorithm 2(e = n II 50 II 2 /M, where M = 40). 




Fig. 7. Plot of significant singular values ai,i = 1,..., 20, 
for eady.mat. Vertical lines indicate grid points. Upper: Al- 
gortihm 1 (e/J™“ = 0.2, where = ||77(5o)||*). Lower; 
Algorithm 2 (e = n WgoWl where M = 40). 

















































































































Fig. 8. Plot of significant singular values ai,i = 1, ..., 20, 
for eady.mat. Vertical lines indicate grid points. Upper; Al- 
gortihm 1 (e/J™“ = 0.3, where J™“ = 11 ^( 50 ) 11*)- Lower: 
Algorithm 2 (e = n \\go \\2 where M = 30). 




Fig. 9. Plot of significant singular values ai,i = 1,..., 6, 
for heat — cent.mat. Vertical lines indicate grid points. Up¬ 
per; Algortihm 1 (e/J™™ = 0.2, where J™“ = ||'H(po)||*)- 
Lower: Algorithm 2 (e = n\\go \\2 /M, where M = 40). 




Fig. 10. Plot of significant singular values ai,i = 1,..., 6, 
for heat — cent.mat. Vertical lines indicate grid points. Up¬ 
per: Algortihm 1 (e/J”” = 0.3, where = \\nigo)\0. 
Lower; Algorithm 2 {e = n\\go\\\ /M, where M = 30). 




Fig. 11. Plot of significant singular values Ui, i = 1,..., 6, 
for pde.mat. Vertical lines indicate grid points. Upper: Al¬ 
gortihm 1 (e/J™“ = 0.2, where = ||'H(5o)IL)- Lower; 
Algorithm 2 (e = n WgoWl where M = 40). 































































































Fig. 12. Plot of significant singular values Oi^i = 1,..., 6, 
for pde.mat. Vertical lines indicate grid points. Upper: Al- 
gortihm 1 {ej J™™ = 0.3, where J™“ = ||'H(5o)||*)- Lower: 
Algorithm 2{e = n \\go\\\ /-W. where M = 30). 


first part of the regularization path; it stops gridding when the 
first argument in ( [T5] l becomes active. Thus, the algorithms 
are suitable in cases where the singular value drops happen 
for low values of A, which is a general observation we have 
made when studying these benchmark examples. 


6. CONCLUSION 

We have proposed a method to approximate the regularization 
path of a quadratically constrained nuclear norm minimiza¬ 
tion problem, with guarantees on the singular values of the 
matrix argument. The algorithms solve the problem for a fi¬ 
nite, explicitly upper-bounded, number of values of the regu¬ 
larization parameter. We have also provided details regarding 
efficient implementation of the algorithms. 

The results show that the algorithms generate grid points 
that are suitable for tracking changes in the singular values. 


7. APPENDIX 


5. RESULTS 


Algorithm 1 and 2 are implemented on single-input-single- 
output model order reduction benchmark^ The continuous¬ 
time impulse response sequences are discretized using the 
Matlab command c2d with sampling times listed in Table 1. 
We here set W = 0, but design a Frank-Wolfe algorithm for 
optimizing Q over W in Section]^ The tolerances e are cho¬ 
sen as a fraction the maximum possible cost of in Algo¬ 
rithm 1, and according to e = n ||go ||2 /AfAig 2 for Algorithm 
2 . 


For large scale problems Q we suggest an Alternating 
Direction Method of Multipliers (ADMM), c.f. [?], [?]. Our 
method is similar to [?] and provided in Section]^ 

The results are summarized in Table 1. We observe that, 
for Algorithm 1, the bound MAigi is very loose. We also see 
the smallest possible e such that d\*{X*,W = 0) < £ for 
any A* € (Amin, Amax)- For the system heat — cont.mat this 
extreme value is high, but we observe that for most part of 
the regularization path W = 0) is very low; it only 

increases very close to Amax- For smaller values of e in Algo¬ 
rithm 1, we may optimize over W. Then, it is possible t o us e 
arbitrarily small e, since dx* (A*, W-^) — 0 (see Remark 2.1 1 . 
In Figure 2 we illustrate the ideas in Algorithm 1 and 2. 

In Figures 3-12 we visualize the grid points for Algorithm 
1 and 2, respectively, when applied to the model the different 
benchmark models. For Algorithm 1, we generally see that 
the grid points are slightly more dense in for smaller values of 
A. For Algorithm 2, the grid points are also more dense in the 


Proof of Lemma [O] The existence of vectors proportional to 
Xo — x°^l can be proved using the optimality conditions of 
1^, which imply that the minimizer of (j^, minimizes 
the Lagrangian L{g^ z) for some Lagrange multiplier z > 0 
[?, pp. 217], i.e., 

L{x,z)\^^^v 

— dx ll-- 4 .(a:)II ^dxZ ||a; a:o||2l^_2,“p> ■ 

A* A* 

The second term is computed explicitly as 

dxZ \\x — a^o|l 2 la;=a:°)’l “ 7| "Apt Tj” (*A* ~ ^o) , 

A* rT ^ — T* 

11 A* 11 2 


so that 


opt 


{xo - xZ') e dx ||Al(a:)||J^^ « , 


showing that there is a subgradient proportional to Xo — a:/*. 

□ 


Proof of Theo rem\3.1\ Consider the choice of h\*{W) = 
hx* (tU-L) in JBi. Then, for A > A*, 


dx* (A, 1U°P‘) < dx* (A, W^) = II/ix* {W^) II2 (A - A*), 


^The model reduction benchmarks are available at slicot.org/20-site/126- 
benchmark-examples-for-model-reduction 


in which we will now bound ||/ix* H^. To do this, we 



























can bound 


\\hx*(W)\\^= \^^{Tr{UV^ + WfHkYj = 

1 

/ n \ 2 / n 

^ (Tr VU^Hk + Tr < E \\HkLf 


\k^l 


\k^l 


— 21E E ii^"-fc+ii 

k—p-\-l 




, fe=l 


(27) 

where we have used in the first inequality twice the character¬ 
ization of the nuclear norm as 

||X||,=sup{tr(y^X):||y||<l} 

with Y = UV'^ such that ||{71/^|| = 1, and Y — W such 
that ||iy|l < 1 , and in the second last equality the unitary 
invariance of the nuclear norm. 

Using the bound on \\hx*{W )\\2 in ||/ia* iy^^) II 2 ~ 
the sub-intervals of the algorithm are of length at most 

\y.l A,- — _ 


2Cr} 


Hence, we need at least 

2Cyi(Aniax Amin) 


evaluations of Q- Since (Amin, A^ax) = ( 0 , II50II2) according 
to we obtain ( [T9| ). If we set VU = 0 in ( |27| ) we obtain 

□ 

Proof of Theorem \3.2\ If the first argument in ( [T5| ) is ig¬ 
nored Algorithm 2 will use a greater or equal number of 
grid points. In this case. Algorithm 2 evaluates ([T]) for 
A = A*,f = l,...,MAig 2 , obtained from (A*_|_i) = e. 
As in (151, sa* (A) = n (A^ — (A*)^), which gives the rela¬ 
tion 


A* 


i+l 


- ( + 


1/2 


By induction in i, with Aq = 0, this becomes 


= ( - 


1/2 


Given that Amax = || 5 o|l 2 > as in (j^, the largest integer M, 


such that 


(^e) ^ < II 50 II 2 obeys MAig 2 = 

Since this is a worst-case scenario we obtain the upper bound 
in ( | 2 T] i. □ 
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